Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
نویسندگان
چکیده
Characters form the focus of various studies of literary works, including social network analysis, archetype induction, and plot comparison. The recent rise in the computational modelling of literary works has produced a proportional rise in the demand for character-annotated literary corpora. However, automatically identifying characters is an open problem and there is low availability of literary texts with manually labelled characters. To address the latter problem, this work presents three contributions: (1) a comprehensive scheme for manually resolving mentions to characters in texts. (2) A novel collaborative annotation tool, CHARLES (CHAracter Resolution Label-Entry System) for character annotation and similiar cross-document tagging tasks. (3) The character annotations resulting from a pilot study on the novel Pride and Prejudice, demonstrating the scheme and tool facilitate the efficient production of high-quality annotations. We expect this work to motivate the further production of annotated literary corpora to help meet the demand of the community.
منابع مشابه
Building annotated resources for automatic text summarisation
Annotated corpora are necessary for automatic summarisation, but given how difficult is to produce them there are only few available. This paper presents an annotation tool which helps the human annotator to select the important units from a text. In addition to the tool, a new annotation scheme is proposed so that phenomena which such as presence of anaphoric expressions and redundancy can be ...
متن کاملAn Investigation into the Use of Category Shifts in the Persian Translation of Charles Dickens’ Great Expectations
The present study aimed at finding Catford‟s category shifts applied in the Persian translation of Charles Dickens‟ novel Great Expectations to determine the most frequently used category shift and to check whether there is a significant difference between category shifts in the translation. To this end, 200 simple declarative sentences from the first 20 chapters...
متن کاملArgumentation Mining on the Web from Information Seeking Perspective
In this paper, we argue that an annotation scheme for argumentation mining is a function of the task requirements and the corpus properties. There is no one-sizefits-all argumentation theory to be applied to realistic data on the Web. In two annotation studies, we experiment with 80 German newspaper editorials from the Web and about one thousand English documents from forums, comments, and blog...
متن کاملAnnotating Particle Realization and Ellipsis in Korean
We present a novel scheme for annotating the realization and ellipsis of Korean particles. Annotated data include 100,128 Ecel (a spacebased word unit) in spoken and written corpora composed of four different genres in order to evaluate how register variation contributes to Korean particle ellipsis. Identifying the grammatical functions of particles and zero particles is critical for deriving a...
متن کاملArabic anaphora resolution: corpora annotation with coreferential links
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...
متن کامل